Laboratory Data Processing

            *************************************************************;
            ** Name:    lab4.sas                                       **;
            ** Author:  T.V. Ramesh                                    **;
            ** Desc:    Program to read lab data from oracle tables    **;
            **          and create reports that will be run weekly     **;
            ** Notes:   - Integrate tables and graphs with data        **;
            **            extraction so that same info is used in both **;
            **          - Modularize program to be run for any lab var **;
            **          - Make program independent of actevent and     **;
            **            dosage group so it can run through out study **;
            **          - Use Proc Report instead of data _null_ for   **;
            **            quick turnaround in standard report format   **;
            **          - Make program adaptable for other studies     **;
            *************************************************************;
            %include 'ro3201195:[rb16074.interim1.adhoc]formats.sas';
            %include 'ro3201195:[rb16074.interim1.adhoc]tf.sas';
            
            options nomacrogen nosymbolgen nomlogic nomprint;
            %let study=RB16074;
            %let compound=RO3201195;
            %let rotate=landscape;  ** portrait,landscape            **;
            %let device=cgmof97l;   ** xcolor,xbw,xgray,win,cgmof97L **;
            %let target=cgmof97l;   ** lj5sips,clj                   **;
            %let gsfname=grafout;   ** grafout or blank              **;
            %let blind=N;
            %let env=;        ** leave blank for prod          **;
            
            ** to reference appropriate CGM device driver, need to call it gdevice0 **;
            libname gdevice0 "&compound:[&study..interim1.data]";
            
            ** to reference randmomization dataset. used separate **;
            ** libref than above for cleaner standards            **;
            libname dat16074 "&compound:[&study..interim1.data]";
            
            ** to suit proc report output **;
            options ls=132 ps=50 formchar='|_---|+|---+=|-/\<>*' nodate center;
            
            ** fileref for gsfname is assigned before graph procedure  **;
            goptions rotate=&rotate device=&device target=&target colors=(black)
                               gsfname=&gsfname gsfmode=replace gsflen=8092;
            
            ***********************************************************;
            ** Assign defaults for plots and charts. Using a common  **;
            ** section cuz making changes would be easier later on   **;
            ***********************************************************;
              ** use this for plotting actual data **;
              axis1      value=(h=.75
                           tick=1 j=c ' '
                           tick=2 j=c 'Screening'
                           tick=3 j=c 'Day 1'
                           tick=4 j=c 'Day 7'
                           tick=5 j=c 'Day 14'
                           tick=6 j=c 'Day 21'
                           tick=7 j=c 'Day 28'
                           tick=8 j=c 'Day 33'
                           tick=9 j=c '')
                        label=(j=center h=.75 "SCHEDULED TIME") major=none minor=none;
            
              ** use this for plotting change variables **;
              axis2    value=(h=.75
                           tick=1 j=c ' '
                           tick=2 j=c 'Day 7'
                           tick=3 j=c 'Day 14'
                           tick=4 j=c 'Day 21'
                           tick=5 j=c 'Day 28'
                           tick=6 j=c 'Day 33'
                           tick=7 j=c ' ')
                    label=(j=center h=.75  "SCHEDULED TIME") major=none minor=none;
            
              ** axis3 (vert axis) and legend are same for scatter and hilo **;
              axis3     value=(h=.75) minor=none label=none;
            
              legend across=8
                       value=(h=.6)
                       shape=symbol(3,.5)
                       label=none
                       origin=(0 in,0.25 in);
            
              ** using commong legends for scatter and hilo for consistency **;
              %macro sym(i,h);
                symbol1 i=&i h=&h  f=marker  v="W" ; ** filled circle   **;
                symbol2 i=&i h=&h  f=swissxb v="X" ; ** bold X          **;
                symbol3 i=&i h=&h  f=markere v="C" ; ** open triangle   **;
                symbol4 i=&i h=&h  f=markere v="P" ; ** open diamond    **;
                symbol5 i=&i h=&h  f=markere v="W" ; ** open circle     **;
                symbol6 i=&i h=&h  f=marker  v="C" ; ** filled triangle **;
                symbol7 i=&i h=&h  f=marker  v="P" ; ** filled diamond  **;
                symbol8 i=&i h=&h  f=markere v="U" ; ** open square     **;
              %mend sym;
            
            options nomacrogen nosymbolgen  nomlogic  nomprint;
            
            ****************************************************************;
            ** Run this macro to extract appropriate records from Normlab **;
            ** containing hi range flag and units. This data is not there **;
            ** in the views corresponding to DCMs. Did not find values    **;
            ** for all lab sasnames in normlab hence had to go to         **;
            ** individual views and not get all data from normlab         **;
            ****************************************************************;
            %macro normext(dcmsub=,labvar1=,labvar2=,labvar3=,dsn=);
              ** generate char string for SQL stmt where clause **;
              %let dcmsubs=%str(%')%upcase(&dcmsub)%str(%');
              %let labvars=%str(%')%upcase(&labvar1)%str(%');
            
              %do i= 2 %to 3;
                %if %length(&&labvar&i) > 0 %then
                  %let labvars = &labvars%str(,%')%upcase(&&labvar&i)%str(%');
              %end;
            
              proc sql;
               connect to oracle;
                ** apparently sample date is same as dcm date. Date is assigned **;
                ** to the DCM that appears on the top of LE1 so haemotalogy     **;
                ** will be assigned the date. Thats in the LE1 view.            **;
                create table &dsn  as
                select * from connection to oracle
                  (select to_number(pt) pt, sasname, dcmname, dcmsubnm,
                        dcmdate, dcmtime, cpevent, actevent,
                        repeatsn, numval, labunit, rngflag, lowrang, hirang
                   from &study$current.normlab
                   where sasname in ( &labvars )
                   and dcmsubnm = &dcmsubs )
                order by pt;
              quit;
            
            %mend;
            %normext(dsn=scrturea,dcmsub=LE1,labvar1=SCRT,labvar2=UREA); ** serum creat & BUN **;
            %normext(dsn=smglob,dcmsub=LE7,labvar1=GLOB2); ** serum microglobulun**;
            
            data normlab;
             set scrturea;
            run;
            
            proc append base=normlab data=smglob force;
            run;
            
            %normext(dsn=ucrt5,dcmsub=LE5,labvar1=UCRT); ** urine creat day 1**;
            %normext(dsn=ucrt6,dcmsub=LE6,labvar1=UCRT); ** urine creat day 28 **;
            %normext(dsn=usmp5,dcmsub=USMP1,labvar1=SMPLWGT); ** urine vol day 1 **;
            %normext(dsn=usmp6,dcmsub=USMP2,labvar1=SMPLWGT); ** urine vol day 28 **;
            
            ***************************************************************;
            ** This macro computes 24 hr creatinine excretion            **;
            ***************************************************************;
            %macro ucrt(day,lenum);
              proc sort data=ucrt&lenum;
                by pt repeatsn;
              run;
            
              proc sort data=usmp&lenum;
                by pt repeatsn;
              run;
            
              data ucrt&day;
                merge ucrt&lenum
                      usmp&lenum (rename=(numval=smplwgt));
                by pt repeatsn;
                retain ucrt24;
                ** Urine sample is expressed in weight (mg) so convert to litre **;
                ** assuming density of urine is 1                               **;
                if first.pt then ucrt24=smplwgt*numval*.001;
                else ucrt24 = ucrt24 + smplwgt*numval*.001;
              run;
            
              proc sort data=ucrt&day;
                by pt descending repeatsn;
              run;
            
              data uc24_&day (keep=pt sasname dcmname dcmsubnm
                               dcmdate dcmtime cpevent actevent
                               repeatsn labunit rngflag lowrang hirang
                               ucrt24 rename=(ucrt24=numval));
                set ucrt&day (drop=numval);
                by pt descending repeatsn;
                if first.pt;
                sasname='UCRT24';
                labunit='MG/24 HRS';
              run;
            
              proc format;
                value repeat
                 1 = '00-06 hrs'
                 2 = '06-12 hrs'
                 3 = '12-24 hrs'
                 ;
              run;
            
              ** preparing to append creatine concentration data **;
              data ucrt&lenum;
               set ucrt&lenum;
               cpevent=trim(left(cpevent)) || ' ' || put(repeatsn,repeat.);
               ** need to check reasonable vals for actevent that are not likely to be assigned **;
               actevent=actevent+repeatsn/1000;
              run;
            
              ** preparing to append urine sample weight data **;
              data usmp&lenum;
               set usmp&lenum;
               cpevent=trim(left(cpevent)) || ' ' || put(repeatsn,repeat.);
               ** need to check reasonable vals for actevent that are not likely to be assigned **;
               actevent=actevent+repeatsn/1000;
              run;
            
              proc append base=normlab data=ucrt&lenum force;
              run;
            
              proc append base=normlab data=usmp&lenum force;
              run;
            %mend;
            %ucrt(1,5);
            %ucrt(28,6);
            
            proc append base=normlab data=uc24_1 force;
            run;
            
            proc append base=normlab data=uc24_28 force;
            run;
            
            ****************************************************************************;
            ** Manipulate urine wt (USMP1,2) and creatinine conc (LE5,LE6) to fit     **;
            ** into normlab struc. This is cuz USMP1 and LE5 are for Day 1 and USMP2  **;
            ** and LE6 are for Day 28, one has to add  ?????????????????????          **;
            ****************************************************************************;
            
            ****************************************************************************;
            ** Extract data for 24hr beta2 microglobulin, 24hr protein, 24hr albumin  **;
            ****************************************************************************;
            %normext(dsn=umglob,dcmsub=LE4,labvar1=UGLOB22,labvar2=UCRT2,labvar3=UALB2);
            
            proc sort data=uc24_1;
             by pt;
            run;
            
            proc sort data=uc24_28;
             by pt;
            run;
            
            ***********************************************************;
            ** Use 24 hr Creatinin from day 1 for Day1 and screening **;
            ** Use 24 hr Creatinin from day 28 for day 28            **;
            ***********************************************************;
            data umglob(drop=uc24_1 uc24_28);
             merge uc24_1(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_1))
                   uc24_28(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_28))
                   umglob(in=in_norm);
             by pt;
             if in_norm;
             ** uc24_1 and uc24_8 is now in mg/24 hrs, and numval which are ratios **;
             ** for UGLOB22,UCRT2 and UALB2 are in per gm, so take product and     **;
             ** divide by 1000 to get wt per 24 hrs                                **;
             if cpevent='SCREENING' or cpevent='DAY 1' then numval=(numval*uc24_1)/1000;
             else if cpevent='DAY 28' then numval=(numval*uc24_28)/1000;
            run;
            
            proc append base=normlab data=umglob force;
            run;
            
            *******************************************************************;
            ** Compute creatinine clearance based on uc24 and SCRT values    **;
            *******************************************************************;
            data crtclear (drop=uc24_1 uc24_28);
              merge uc24_1(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_1))
                    uc24_28(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_28))
                    scrturea(in=in_scrt where=(sasname='SCRT'));
             by pt;
             sasname='CRTCLR';
             labunit='ML/MIN';
             if in_scrt;
             if cpevent='DAY 28' then numval=(uc24_28/(numval/8840))/1440;
             else numval=(uc24_1/(numval/8840))/1440;
            run;
            
            proc append base=normlab data=crtclear;
            run;
            
            *******************************************************************;
            ** Obtain Spot protein values if protein dipstick value in LE2   **;
            ** is greater than 1+, want to report this anyway just to confir **;
            ** since this is a char val and everything else is number may    **;
            ** have to use different program and report statements           **;
            *******************************************************************;
            *normext(dsn=uproqlt,dcmsub=LE2,labvar1=UPROQLT);
            
            ** get just spot protein data for now **;
            %normext(dsn=le8prot,dcmsub=LE8,labvar1=UCRT2);   ** urine protein **;
            
            ** need to multiply with appropriate 24hr creatinine excretion **;
            data le8prot (drop=uc24_1 uc24_28);
             merge  uc24_1(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_1))
                    uc24_28(in=in_uc keep=pt cpevent actevent numval rename=(numval=uc24_28))
                    le8prot (in=in_norm);
             by pt;
             if in_norm;
             if cpevent='DAY 28' then numval=(numval*uc24_28)/1000;
             else numval=(numval*uc24_1)/1000;
             ** had to rename data from LE8 to someother val cuz data **;
             ** in le4 also has UCRT2 val, and in current ver not     **;
             ** selecting by dcmname                                  **;
             sasname='SPROT';
            run;
            
            proc append base=normlab data=le8prot force;
            run;
            
            proc sql;
             connect to oracle;
              ** get earliest dosing record for DAY1. This will be used  **;
              ** to compare with sampling date to make sure only samples **;
              ** taken before DAY1 dose will be treated as baseline      **;
              create table dosing as
              select * from connection to oracle
                (select to_number(pt) pt, mtbegdc dosedt, mtbegtc dosetm
                 from &study$current.mtdmtd
                 where mtpnam='DAY 1-AM DOSE')
              order by pt;
             disconnect from oracle;
            quit;
            
            proc sort data=normlab;
              by pt;
            run;
            
            ** Check to see if baseline measurements were taken after   **;
            ** baseline date and time. need to have logic to handle     **;
            ** these records. Fortunately none right now! Ask David     **;
            data badbase;
              merge normlab(where=(cpevent='DAY 1'))
                    dosing;
              by pt;
              if dcmdate > dosedt or (dcmdate=dosedt and dcmtime > dosetm);
            run;
            
            ** david asked for study dosing date var which is assessment date **;
            ** minus first dosing date plus 1. need to add that here          **;
            
            proc sql;
             title1 'Number of records where Baseline Lab Data was collected after Day 1';
             select count(*) from badbase;
            quit;
            
            ** If actevent has decimal values then it means test was redone.   **;
            ** For listing report all tests, but for summaries use the latest  **;
            ** test, per David. Hence using existing actevent vals             **;
            data labdose;
              merge normlab(in=in_norm)
                    dat16074.rnd16074 (in=in_rand);
              by pt;
              if in_norm;
              ** need to create new var cuz using pattern for grouping is ok but **;
              ** for sorting not ok since it puts placebo at bottom.             **;
              trtgrp=input(put(pattern,$trt_c.),8.);
            
              ** creating new actevent var just to capture integer vals **;
              ** since for summaries only the last rec will be used     **;
              newact=int(actevent);
            run;
            
            
            ** need to add logic here so that only summary reports use newact   **;
            ** need to get to this after listing is complete                    **;
            proc sort data=labdose;
              by sasname trtgrp pt repeatsn cpevent newact actevent;
            run;
            
            data labdose;
             set labdose;
            *by sasname trtgrp pt repeatsn cpevent newact;
            *if last.newact;
            run;
            
            
            proc sort data=labdose;
              by sasname trtgrp pt cpevent actevent;
            run;
            
            ** must ensure there are not multiple records based on repeatsn at this stage **;
            data labprint;
              set labdose;
              by sasname trtgrp pt cpevent;
              retain baseline;
              ** remove formats or else causing extra space in graph Y axis label **;
              format numval chngbase chngper;
              if first.pt then baseline=.;
              ** in case multiple records for cpevent ie actevent x.01 then use the last value**;
              if cpevent='DAY 1' and last.cpevent then baseline=numval;
              else if cpevent ne 'SCREENING' then
              do;
                chngbase=numval-baseline;
                chngper=100*(chngbase/baseline);
              end;
            run;
            
            proc sort data=labprint;
              by trtgrp pt cpevent actevent;
            run;
            
            ** create dynamic format for actevent. needed to place this **;
            ** here as opposed to formats.sas cuz formats.sas runs at   **;
            ** the begining before these datasets are created           **;
            proc sql;
             create table fmt as
             select distinct(actevent) as start, cpevent as label
             from labprint;
            quit;
            
            data fmt;
             set fmt;
             fmtname='cp_n';
             type='N';
             length decimal $ 10;
             if start-int(start) > 0 then decimal=put(start-int(start),4.3);
             else decimal='';
             if decimal ne ''
               then label=trim(label) || substr(decimal,index(decimal,'.'));
            run;
            
            proc format cntlin=fmt;
            run;
            
            options nomacrogen nosymbolgen nomlogic nomprint;
            **********************************************************************;
            ** Listing Report                                                   **;
            ** Put in macro cuz needed to use macro if then logic in report step**;
            **********************************************************************;
            %macro listrpt(blind=N,fpref=,labtest=,fname=,change=,labvar1=,
                           labvar2=,labvar3=,labvar4=);
            
              options pageno=1 missing=' ' ls=132 ps=50;
            
              %if %upcase(&change)=Y %then %let change=%str(chngbase chngper);
              %else %let change=;
            
              %if %upcase(&labtest)=SERUM %then
                      %let whr = %str((where=(sasname in ('SCRT','UREA','GLOB2','CRTCLR'))));
              %else %if %upcase(&labtest)=URINE %then
                 %let whr=%str((where=(sasname in ('UGLOB22','UCRT2','UALB2','UCRT24'))));
              %else %if %upcase(&labtest)=%str(URINE_CREATININE) %then
                 %let whr=%str((where=(sasname in ('SMPLWGT','UCRT'))));
              %else %if %upcase(&labtest)=%str(SPOT_PROTEIN) %then
                 %let whr=%str((where=(sasname in ('SPROT'))));
              %else %if %upcase(&labtest)=%str(BUN_CREATININ) %then
                 %let whr=%str((where=(sasname in ('SCRT','UREA'))));
            
              %if %upcase(&blind)=Y %then %let fname=_&fname._blinded;
              %else %let fname=_&fname;
            
              filename prntout  "&compound:[&study..interim1.adhoc.tablib]&fpref.lab_list&fname..tab";
            
              proc printto print=prntout new;
              run;
            
              %tf(t,1,%str(&study - Interim Analysis),l);
              %tf(t,2,%str(          &compound),l);
              %tf(t,3,Listing of &labtest Lab Data,c);
              %tf(t,4,%str(Actual Measurement, Change from Baseline and % Change from Baseline (Day 1)),c);
              %tf(t,5,,);
              %tf(t,6,_,x);
              %tf(f,1,_,x);
              %tf(f,2, ,l);
              %tf(f,3,%str(NOTE: Change=Follow-up Measurement - Baseline Measurement (Day 1) , % Change = Change / Baseline (Day 1)),l);
              %tf(f,4,%str(      (R)=Repeated Sample  @ indicates the record is excluded from summary analysis),l);
              %tf(f,5,%str(      H/L=Actual measurement is above/below normal range),l);
              %tf(f,6,,);
              %tf(f,7,,s);
              %tf(f,8,,t);
            
              proc report data=labprint &whr
                          nowd headline headskip ls=132 ps=50 split='*' center;
                column trtgrp pt actevent sasname,(numval rngflag &change          );
                by notsorted;
                break after pt       / skip;
                break after trtgrp   / skip;
                define trtgrp   / 'Treatment*Group' group order=internal format=trt_n. width=15;
                define pt       / 'Patient*Id' group width=10 format=z3.;
                define actevent / 'Scheduled*Time' group format=cp_n. order=internal id width=15;
                define sasname  / across format=$labfmtr. order=internal '- LAB TEST -';
                define numval   / 'Value' format=8.2 width=8 spacing=7;
                define rngflag  / ' ' width=1 format=$rngflg.;
                define chngbase / 'Change' format=8.1 width=8 ;
                define chngper  / '% Change' format=pctfmt. width=8;
                compute pt;
                 %if %upcase(&blind)=Y %then %str(pt=int(1.2469*pt););
                endcompute;
              run;
            
              %tf(t,0,,);
              %tf(f,0,,);
            
              proc printto;
              run;
            
              filename prntout clear;
            %mend listrpt;
            %listrpt(blind=n,fpref=&env,labtest=Serum,fname=serum,change=Y);
            %listrpt(blind=n,fpref=&env,labtest=Urine,fname=urine,change=Y);
            %listrpt(blind=n,fpref=&env,labtest=%str(Urine_Creatinine),fname=urine_creat,change=Y);
            %listrpt(blind=n,fpref=&env,labtest=%str(Spot_Protein),fname=spot_protein,change=Y);
            %listrpt(blind=n,fpref=&env,labtest=%str(BUN_Creatinin),fname=bun_creatinin,change=Y);
            
            %listrpt(blind=y,fpref=&env,labtest=Serum,fname=serum,change=Y);
            %listrpt(blind=y,fpref=&env,labtest=Urine,fname=urine,change=Y);
            %listrpt(blind=y,fpref=&env,labtest=%str(Urine_Creatinine),fname=urine_creat,change=Y);
            %listrpt(blind=y,fpref=&env,labtest=%str(Spot_Protein),fname=spot_protein,change=Y);
            %listrpt(blind=y,fpref=&env,labtest=%str(BUN_Creatinin),fname=bun_creatinin,change=Y);
            
            ******************************************************;
            ** Scatter Plots                                    **;
            ******************************************************;
            %macro scatter(labvar=,calctype=,num=,fpref=);
            
              ** assign symbols for scatter plot **;
              %sym(none,.5);
            
              %if %upcase(&calctype)=ACT %then
              %do;
                %let graphvar=numval;
                %let whr=;
              %end;
              %else
              %do;
                %let graphvar=chngbase;
                %let whr = (where= (cpevent in ('DAY 7','DAY 14', 'DAY 21', 'DAY 28')));
              %end;
            
              proc sort data=labprint(where=(sasname="&labvar")) out=le1plot;
                by actevent;
              run;
            
              ** deliberately having an extra step here to prepare data **;
              data le1plot;
               set le1plot &whr;
               ** to avoide decimals actevent **;
               actevent=int(actevent);
              run;
            
              proc sort data=le1plot;
                by actevent;
              run;
            
              data le1plot1;
               set le1plot;
               by actevent;
               retain newtime intact;
               ** intact is an integer counter for each set of actevents **;
               ** newtime is the corresponding real number               **;
               if _n_=1 then intact=1;
               if first.actevent then
               do;
                 intact=intact+1;
                 newtime=intact;
               end;
               else newtime=newtime+(round(ranuni(1),.01)-.5)/3;
              run;
            
              ** for title in proc report **;
              data _null_;
               call symput('var1',trim(put(upcase("&labvar"),$labfmt.)));
               call symput('var2',trim(put(upcase("&labvar"),$labunit.)));
               call symput('var3',trim(put(upcase("&calctype"),$stat_d.)));
               call symput('fname',compress(put(upcase("&labvar"),$labfmt.)));
              run;
            
              filename grafout  "&compound:[&study..interim1.adhoc.cgmlib]&fpref.&num.scat_&calctype._&fname..cgm";
            
              title1 h=1.25   j=c ls=.25 "Scatter Plot of &var3 Over Time";
              title2 ' ';
              title3 h=.75 j=c "&var1 (&var2)";
              title4 h=.75     " ";
            
              %tf(gf,1,,s);
              %tf(gf,2,,g);
            
              proc gplot data=le1plot1;
                plot &graphvar * newtime  = trtgrp / vaxis=axis3 hminor=0 legend=legend1
                     %if %upcase(&calctype)=CHNG %then %str(haxis=axis2 vref=0 lvref=1;);
                     %else %if %upcase(&calctype)=ACT %then %str(haxis=axis1;);
                format trtgrp trt_n.;
              run;
              quit;
            
              %tf(t,0,,);
              %tf(f,0,,);
            %mend;
            %scatter(labvar=SCRT,calctype=act,num=,fpref=&env);
            %scatter(labvar=UREA,calctype=act,num=,fpref=&env);
            %scatter(labvar=SCRT,calctype=chng,num=,fpref=&env);
            %scatter(labvar=UREA,calctype=chng,num=,fpref=&env);
            
            
            **********************************************************************;
            ** Descriptive Statistics (Univariate) Summary Report               **;
            ** calcvar is the analysis variable - numval for raw data,          **;
            **      chngbase for change from baseline and chngper for percent   **;
            **      change from baseline                                        **;
            ** order Order in which these should show up in the report          **;
            ** keeping both cpevent and actevent for now                        **;
            **  Macro called by univarpt                                        **;
            **********************************************************************;
            %macro univar(calcvar,order,labvar);
              proc sort data=labprint (where=(sasname=upcase("&labvar"))) out=le1univ;
               by trtgrp pt cpevent actevent;
              run;
            
              ** was told by David to use the last actevent in case of multiples **;
              ** for summarization.                                              **;
              data le1univ;
               set le1univ;
               by trtgrp pt cpevent actevent;
               if last.cpevent;
               ** this is cuz in some case pt had only actevent=x.yy **;
               ** so these records were selected based on above      **;
               ** condition and caused probs in transpose at rpttran **;
               actevent=int(actevent);
              run;
            
              proc sort data=le1univ;
                by trtgrp cpevent actevent;
              run;
            
              proc univariate data=le1univ noprint;
                by trtgrp cpevent actevent;
                var &calcvar;
                output out=univar
                       n=n mean=mean median=median min=min max=max
                       stdmean=stdmean q1=q1 q3=q3 p90=p90 p95=p95 p99=p99 ;
              run;
            
            
              options missing='*';
              ** meanse and ci95 datasets will also be used for HiLo plots **;
             ** for raw data and change from baseline create dataset for hilo plots **;
              data
                %if %upcase(&calcvar)=NUMVAL %then
                  %str( hiloval (keep=trtgrp cpevent actevent mean stdmean ci95) );
                %else %if %upcase(&calcvar)=CHNGBASE %then
                  %str( hilochng(keep=trtgrp cpevent actevent mean stdmean ci95)
                        psumchng(keep=trtgrp cpevent actevent p90 p95 p99) );
                   meannse(keep=trtgrp cpevent actevent meannse)
                   medianq(keep=trtgrp cpevent actevent medianq)
                   minmax (keep=trtgrp cpevent actevent minmax)
                   percent(keep=trtgrp cpevent actevent percent)
                   ci95c  (keep=trtgrp cpevent actevent ci95c);
               set univar;
               length meannse medianq minmax percent ci95c $ 20;
               ci95 = tinv(.975,n-1)*stdmean;
               hi=mean+ci95;
               lo=mean-ci95;
               ci95c   = compress(put(ci95,6.1)) || ' (' || compress(put(hi,6.1)) || ', ' || compress(put(lo,6.1)) || ')';
               meannse = compress(put(mean,6.1)) || ' (' || compress(put(n,3.)) || ', ' || compress(put(stdmean,5.1)) || ')';
               medianq = compress(put(median,6.1)) || ' (' || compress(put(q1,6.1)) ||  ', ' || compress(put(q3,6.1)) || ')';
               minmax = '(' || compress(put(min,6.1)) || ', ' || compress(put(max,6.1)) || ')';
               percent = compress(put(p90,6.1)) || ',' || compress(put(p95,6.1)) || ',' || compress(put(p99,6.1));
               ** Screening and Day 1 should show blank for change and percent change **;
               %if %upcase(&calcvar) = CHNGBASE or %upcase(&calcvar) = CHNGPER %then
               %do;
                   if actevent in (1,2) then
                   do;
                     ** reset values to blank instead of * **;
                     ci95c=''; meannse=''; medianq=''; minmax=''; percent='';
                   end;
                %end;
              run;
            
              ************************************************************************************;
              ** need to unstring the dataset, using macro cuz there is no doover in macro lang **;
              ** var2  the munged statistical analysis masures                                  **;
              ** ord2  the order in which they should appear in the report                      **;
              ************************************************************************************;
              %macro dostat(var2,ord2);
                data &var2;
                 set &var2(rename=(&var2=value));
                 ** to insure proper sort order in rpt. no need for this var to be char **;
                 length stat $ 2;
                 stat="&ord2";
                run;
              %mend dostat;
              %dostat(meannse,01);
              %dostat(medianq,02);
              %dostat(minmax,03);
              %dostat(percent,04);
              %dostat(ci95c,05);
            
              data &calcvar;
               set meannse medianq minmax percent ci95c;
               ** this ensures proper sort order in the report **;
               vartyp = ℴ
              run;
            %mend univar;
            
            ***********************************************************************;
            ** Percentile Summary Report                                         **;
            **   Why not have cpevent as well here.. need to chk cpevent/actevent**;
            **   Called from within Desc Stat summary report cuz of dependancy of**;
            **   intermediate datasets created by desc stat                      **;
            ** called from univarpt                                              **;
            ***********************************************************************;
            %macro perrpt(labvar=,fpref=);
             ** where clause for change variables **;
             %let whr = %str(cpevent in ('DAY 7','DAY 14', 'DAY 21', 'DAY 28'));
            
              proc sort data=labprint (where=(sasname=upcase("&labvar"))) out=le1percg;
                by trtgrp pt cpevent actevent;
              run;
            
              ** was told by David to use the last actevent in case of multiples **;
              ** for summarization.                                              **;
              data le1percg;
               set le1percg;
               by trtgrp pt cpevent actevent;
               if last.cpevent;
               ** this is cuz in some case pt had only actevent=x.yy **;
               ** so these records were selected based on above      **;
               ** condition and caused probs in transpose at rpttran **;
               actevent=int(actevent);
              run;
            
              proc sort data=le1percg;
                by trtgrp actevent;
              run;
            
              proc sort data=psumchng;
                by trtgrp actevent;
              run;
            
              data range;
                merge le1percg(where=(sasname="&labvar" and &whr ))
                      psumchng(where=( &whr ));
                by trtgrp actevent;
                ** count records which hit High water mark **;
                if rngflag='H' then hc=1;
                else hc=0;
            
                ** count records where there was a positive change **;
                if chngbase > 0 then cc=1;
                else cc=0;
            
                ** count records which cross percentile marks **;
                if chngbase > p90 then n90=1;
                else n90=0;
                if chngbase > p95 then n95=1;
                else n95=0;
                if chngbase > p99 then n99=1;
                else n99=0;
              run;
            
              proc summary data=range nway missing;
               class trtgrp actevent;
               var hc cc n90 n95 n99;
               output out=rngsum sum=;
              run;
            
              data prn;
                merge rngsum(in=in_sum rename=(_freq_=num) drop=_type_)
                      psumchng(rename=(p90=val90 p95=val95 p99=val99));
                by trtgrp actevent;
                if in_sum;
                ** rename vars to val above cuz p_ denotes percentage here **;
                array p(5) p_hc p_cc p_n90 p_n95 p_n99;
                array v(5) hc cc n90 n95 n99;
                do i = 1 to 5;
                  p(i)=v(i)/num;
                end;
              run;
            
              data _null_;
               call symput('var1',trim(put(upcase("&labvar"),$labfmt.)));
               call symput('fname',compress(put(upcase("&labvar"),$labfmt.)));
              run;
            
              options pageno=1 missing=' ' ls=132 ps=50;
            
              filename prntout  "&compound:[&study..interim1.adhoc.tablib]&fpref.lab_per_sum_&fname..tab";
            
              proc printto print=prntout new;
              run;
            
              %tf(t,1,%str(&study - Interim Analysis),l);
              %tf(t,2,%str(          &compound),l);
              %tf(t,3,%str(Summary of Percentiles for &var1 Lab Data),c);
              %tf(t,4,Change from Mean of Baseline,c);
              %tf(t,5,,);
              %tf(t,6,_,x);
              %tf(f,1,_,x);
              %tf(f,2,,);
              %tf(f,3,,s);
              %tf(f,4,,t);
            
              proc report data=prn nowd headline headskip center ls=132 ps=50 split='*';
                column trtgrp actevent num
                    (' > High' 'Normal Range' hc p_hc)
                    ('Lab Result' 'Increased' 'Post Dose' cc p_cc)
                    ('> 90% Change' 'from Baseline' val90 n90 p_n90)
                    ('> 95% Change' 'from Baseline' val95 n95 p_n95)
                    ('> 99% Change' 'from Baseline' val99 n99 p_n99);
                break after trtgrp   / skip;
                break after actevent / skip;
                define trtgrp   / 'Group' group order=internal format=trt_n.  width=12;
                define actevent / 'Event' group order=internal format=cp_n.    width=10;
                define num      / '#*Subj' width=4 format=4.;
                define hc       / 'N' width=3 spacing=3;
                define p_hc     / '%' width=6 format=percent6.1;
                define cc       / 'N' width=3 spacing=3;
                define p_cc     / '%' width=6 format=percent6.1;
                define val90    / '90%' width=6 format=5.1 spacing=4;
                define n90      / 'N' width=3;
                define p_n90    / '%' width=7 format=percent6.1;
                define val95    / '95%' width=6 format=5.1 spacing=4;
                define n95      / 'N' width=3;
                define p_n95    / '%' width=7 format=percent6.1;
                define val99    / '99%' width=6 format=5.1 spacing=4;
                define n99      / 'N' width=3;
                define p_n99    / '%' width=7 format=percent6.1;
              run;
            
              %tf(t,0,,);
              %tf(f,0,,);
            
              proc printto;
              run;
            
              filename prntout clear;
            %mend;
            
            ****************************************************************;
            ** HiLo Plots                                                 **;
            ** Called for within univarpt                                 **;
            ****************************************************************;
            %macro hilo(labvar=,calctype=,stat=,num=,fpref=);
            
              ** assign symbols for hilo plot **;
              %sym(hilo,.5);
            
              filename grafout  "&compound:[&study..interim1.adhoc.cgmlib]&fpref.&num.hilo_&calctype._&stat._&labvar..cgm";
            
              %if %upcase(&calctype)=ACT %then
              %do;
                %let graphvar=numval;
                %let whr=;
              %end;
              %else
              %do;
                %let graphvar=chngbase;
                %let whr = (where= (cpevent in ('DAY 7','DAY 14', 'DAY 21', 'DAY 28', 'DAY 33(TERM)')));
              %end;
            
              ** dropping cpevent here for now.. need to establish cpevent/actevent logic **;
              data sumstat(drop=mean ci95 stdmean cpevent);
               %if %upcase(&calctype)=ACT %then %str(set hiloval &whr;);
               %else %if %upcase(&calctype)=CHNG %then %str(set hilochng &whr;);
               cilo= mean - &stat;
               cihi= mean + &stat;
              run;
            
              proc sort data=sumstat;
                by actevent;
              run;
            
              ** deliberately having an extra step here to prepare data **;
              data sumstat;
               set sumstat;
               ** to avoide decimals actevent **;
               actevent=int(actevent);
              run;
            
              proc sort data=sumstat;
                by actevent;
              run;
            
              data sumstat1;
               set sumstat;
               by actevent;
               retain newtime intact;
               ** intact is an integer counter for each set of actevents **;
               ** newtime is the corresponding real number               **;
               if _n_=1 then intact=1;
               if first.actevent then
               do;
                 intact=intact+1;
                 newtime=intact;
               end;
               else newtime=newtime+(round(ranuni(1),.01)-.5)/3;
               *newtime=actevent+(.08*(trtgrp-1));
              run;
            
              proc sort data=sumstat1;
                by trtgrp actevent newtime;
              run;
            
              proc transpose data=sumstat1(drop=intact) out=sumtran;
                by trtgrp actevent newtime;
              run;
            
              ** for title in proc report **;
              data _null_;
               call symput('var1',trim(put(upcase("&labvar"),$labfmt.)));
               call symput('var2',trim(put(upcase("&labvar"),$labunit.)));
               call symput('var3',trim(put(upcase("&calctype"),$stat_d.)));
               call symput('title2',trim(put(upcase("&stat"),$stat_c.)));
              run;
            
              title1 h=1.25 j=c "Plot of Mean of &var3 &title2";
              title2 ' ';
              title3 h=.75 j=c "&var1 (&var2)";
              title4 h=.75 j=c " ";
            
              %tf(gf,1,,s);
              %tf(gf,2,,g);
            
              proc gplot data=sumtran;
                plot col1 * newtime = trtgrp /legend=legend1 vaxis=axis3 hminor=0
                     %if %upcase(&calctype)=CHNG %then %str(haxis=axis2 vref=0 lvref=1;);
                     %else %if %upcase(&calctype)=ACT %then %str(haxis=axis1;);
                format trtgrp trt_n. ;
              run;
              quit;
            
              %tf(t,0,,);
              %tf(f,0,,);
            %mend;
            
            ****************************************************************;
            ** Descriptive statistics summary report                      **;
            ** Calls macro perrpt and univar                              **;
            ****************************************************************;
            %macro univarpt(labvar=,fpref=);
              %univar(numval,1,&labvar);
              %univar(chngbase,2,&labvar);
              %univar(chngper,3,&labvar);
            
              data report;
                set numval chngbase chngper;
              run;
            
              proc sort data=report;
                by trtgrp vartyp stat;
              run;
            
              ** Had to resort to proc transpose because using actevent as across to display **;
              ** char values did not work out. Although that would be the better way to do   **;
              ** must check with SI about this                                               **;
              proc transpose data=report prefix=trt out=rpttran;
                by trtgrp vartyp stat;
                id actevent;
                var value;
              run;
            
              options pageno=1 ls=145 ps=50;
              data _null_;
               ** for title in proc report **;
               call symput('var1',compress(put(upcase("&labvar"),$labfmt.)));
               ** for naming file **;
               call symput('fname',compress(put(upcase("&labvar"),$labfmt.)));
              run;
            
              filename prntout  "&compound:[&study..interim1.adhoc.tablib]&fpref.lab_stat_sum_&fname..tab";
            
              proc printto print=prntout new;
              run;
            
              %tf(t,1,%str(&study - Interim Analysis),l);
              %tf(t,2,%str(          &compound),l);
              %tf(t,3,%str(Summary of Descriptive Statistics for &var1 Lab Data,),);
              %tf(t,4,%str(Change and % Change from Baseline (Day 1) From Univariate Analysis),);
              %tf(t,5,_,x);
              %tf(f,1,_,x);
              %tf(f,2,%str(* indicates statistic could not be computed),l);
              %tf(f,3,,s);
              %tf(f,4,,t);
            
              ** need to add logic to automatically generate transposed vars **;
              ** based on actevent values                                    **;
              proc report data=rpttran nowd headline headskip ls=132 ps=50 split='*';
                column trtgrp vartyp stat trt1 trt4 trt6 trt11 trt13 trt15;
                break after trtgrp   / skip;
                break after vartyp   / skip;
                define trtgrp   / 'Group' group order=internal format=trt_n.  width=12;
                define vartyp   / ' ' group width=10 order=internal format=vartyp.;
                define stat     / ' Statistic' group id order=internal format=$stat_e. width=18;
                define trt1       / 'Screening' width=20;
                define trt4       / 'Day 1' width=25;
                define trt6       / 'Day 7' width=25;
                define trt11      / 'Day 14' width=25;
                define trt13      / 'Day 21' width=25;
                define trt15      / 'Day 28' width=25;
              run;
            
              %tf(t,0,,);
              %tf(f,0,,);
            
              proc printto;
              run;
            
              filename prntout clear;
            
              ** call macro for percentile summary report **;
              %perrpt(labvar=&labvar,fpref=&env);
            
              ** call macro for HiLo plots **;
              %hilo(labvar=&labvar,calctype=act,stat=ci95,num=,fpref=&env);
              %hilo(labvar=&labvar,calctype=act,stat=stdmean,num=,fpref=&env);
              %hilo(labvar=&labvar,calctype=chng,stat=ci95,num=,fpref=&env);
              %hilo(labvar=&labvar,calctype=chng,stat=stdmean,num=,fpref=&env);
            %mend;
            %univarpt(labvar=SCRT,fpref=&env);
            %univarpt(labvar=UREA,fpref=&env);
            
            ** reset formchar values **;
            options formchar='|----|+|---+=|-/\<>*';
            
            
            
            
          

Survey Data Analysis

                 ****************************************************************;
                 ** Author: T.V. Ramesh                                        **;
                 ** Purpose: Program to analyze data gathered from surveys.    **;
                 ** Description: First generate dummy data using random        **;
                 **              number generation techniques and then create  **;
                 **              representative output using Univariate and    **;
                 **              multivariate analysis techniques in SAS/Stat  **;
                 **              Program to serve as preliminary exploration   **;
                 **              of data, to be followed by several iterations **;
                 **              employing detailed procedure options          **;
                 ****************************************************************;
                 libname out 'f:\sun';
                 *************************************************************;
                 ** Variables correspond to coded question names in survey  **;
                 ** instrument. Generate dummy data for each respondent,    **;
                 ** within the range of possible values, using random       **;
                 ** number functions. Eventually this dataset will be       **;
                 ** replaced by actual data keyed in from survey instrument **;
                 *************************************************************;
                 data out.xmas(drop=i);
                   length default=4;
                   ** define questions as arrays cuz random number generation **;
                   ** group according to values questions can take based on   **;
                   ** likert scale responses.                                 **;
                   array A1_2(2)  sex parent;
                   array A1_3(7)  famtalk famnbr workhrs enjtime famtime jobtime pressure;
                   array A1_4(3)  age income giftcard;
                   array A1_5(13) enjtoday enjhope crime responsb comrcial remember enjgive
                                  enjdsply enjmrchn enjdecor enjcost enjendrs enjads;
                   array A0_5(2)  feelgood feelbad;
                   label
                   id       = 'Subject ID'
                   age      = 'Age'
                   sex      = 'Sex'
                   parent   = 'Parent'
                   income   = 'Combined family income'
                   enjtoday = '1  Enj now / 10 yrs ago'
                   enjhope  = '2  Hope to enj / last yr'
                   feelgood = '3  Positive feelings'
                   feelbad  = '4  Negative feelings'
                   giftcard = '5  Send gifts & cards'
                   famtalk  = '6  Talk to family,friend'
                   famnbr   = '7  # of family members'
                   genattid = '8  General attitude'
                   crime    = '9  Goodwill / crime figs'
                   responsb = '10 Responsblt as barrier'
                   comrcial = '11 Commercial sentiment'
                   remember = '12 Remember celebration'
                   workhrs  = '13 Work hrs VS last yr'
                   enjtime  = '14 Enj time / last yr'
                   famtime  = '15 Family time / last yr'
                   jobtime  = '16 Job time vs last yr'
                   pressure = '17 Pressure vs last yr'
                   enjgive  = '18 Enj gift giving'
                   enjdsply = '19 Enj comrcl displays'
                   enjmrchn = '20 Enj merchandise'
                   enjdecor = '21 Enj home decors'
                   enjcost  = '22 Enj cost of living'
                   enjendrs = '23 Enj celeb endorsmnt'
                   enjads   = '24 Enj media ads'
                   enjoymnt = 'Total enjoyment';
                   ** for each id create appropriate random numbers **;
                   do id = 1 to 210;
                     ** 1 - 2 random numbers;
                      do i = 1 to 2;
                         do until ( 1 <= a1_2(i) <= 2);
                           a1_2(i) = int(1.5 + sqrt(5)*rannor(0));
                         end;
                      end;
                     ** 1 - 3 random numbers;
                      do i = 1 to 7;
                         do until ( 1 <= a1_3(i) <= 3);
                           a1_3(i) = int(2.0 + sqrt(5)*rannor(0));
                         end;
                      end;
                     ** 1 - 4 random numbers;
                      do i = 1 to 3;
                         do until ( 1 <= a1_4(i) <= 4);
                           a1_4(i) = int(2.5 + sqrt(5)*rannor(0));
                         end;
                      end;
                     ** 1 - 5 random numbers;
                      do i = 1 to 13;
                         do until ( 1 <= a1_5(i) <= 5);
                           a1_5(i) = int(3.0 + sqrt(5)*rannor(0));
                         end;
                      end;
                     ** 0 - 5 random numbers;
                      do i = 1 to 2;
                         do until ( 0 <= a0_5(i) <= 5);
                            a0_5(i) = int(2.5 + sqrt(5)*rannor(0));
                         end;
                      end;
                     ** 0 - 7 random numbers;
                     do until ( 0 <= genattid <= 7);
                        genattid = int(3.5 + sqrt(5)*rannor(0));
                     end;
                     enjoymnt = enjtoday + enjhope;
                     output;
                   end;
                 run;
                 
                 options ls=132 ps=55 nodate nonumber;
                 
                 ** proc means for simple statistics, all vars processed **;
                 proc means data=out.xmas n range var mean min max std;
                   title 'Simple univariate statistics';
                 run;
                 ** Chi Square test  **;
                 proc freq;
                   tables age * giftcard;
                    title 'Chi Square';
                 run;
                 
                 ** t test  **;
                 proc ttest;
                   class sex;
                   var  giftcard feelgood feelbad genattid
                        famtalk famnbr workhrs enjtime famtime jobtime
                        pressure crime responsb comrcial remember enjgive
                        enjdsply enjmrchn enjdecor enjcost enjendrs enjads;
                   title 't-test';
                 run;
                 ** correlation of enjoyment vars (dependent) with all others **;
                 proc corr data=out.xmas nosimple;
                   var  sex parent age income giftcard feelgood feelbad
                        famtalk famnbr workhrs enjtime famtime jobtime genattid
                        pressure crime responsb comrcial remember enjgive
                        enjdsply enjmrchn enjdecor enjcost enjendrs enjads;
                   with enjtoday enjhope enjoymnt;
                   title 'Corelation analysis';
                 run;
                 
                 ** factor analyze all independent vars, except demog **;
                 proc factor data=out.xmas rotate=varimax scree score;
                   var  giftcard feelgood feelbad genattid
                        famtalk famnbr workhrs enjtime famtime jobtime
                        pressure crime responsb comrcial remember enjgive
                        enjdsply enjmrchn enjdecor enjcost enjendrs enjads;
                   title 'Factor Analysis';
                 run;
                 
                 ** regress dependent vars (enjoyment vars) with all others **;
                 proc reg data=out.xmas;
                   model enjoymnt enjtoday enjhope =
                           giftcard feelgood feelbad genattid
                           famtalk famnbr workhrs enjtime famtime jobtime
                           pressure crime responsb comrcial remember enjgive
                          enjdsply enjmrchn enjdecor enjcost enjendrs enjads;
                   title 'Regression Analysis';
                  run;
                 
                

Healhcare Data Processing

              ****************************************************************;
              **                                                            **;
              ** PROGRAM NAME: nsf_out.sas    AUTHOR                        **;
              ** AUTHOR: T.V. Ramesh                                        **;
              ** DATE: 21Nov97                                              **;
              **                                                            **;
              ** PURPOSE: Write out the data to the NSF1500 specification.  **;
              **                                                            **;
              ****************************************************************;
              
              ** need to have sasautos lib in autocall path to use left func **;
              filename autos ('/u1/sas612/sasautos' '../macro' '$sastools/macro' '$enctools/macro');
              options sasautos=autos;
              
              ***************************************************************;
              ** Macro nsf_out reads enc dataset and creates flatfiles.    **;
              ** This process was designed to be invoked from a macro cuz  **;
              ** it has to be repeated for several quarters                **;
              ** It also generates a summary report and can ftp files      **;
              ** Parameters are -                                          **;
              ** testprod   TEST for test, PROD for prod as specified in   **;
              **            mapping document. Default is TEST.             **;
              ** report     Y genertes summary report at end of processing **;
              **            Default = Y.                                   **;
              ** ftp        sun=sends file to local directory /ftptest     **;
              **            stratus=sends file to east coast               **;
              **            Default = N, file not sent. Use lower case     **;
              **            could not invoke autocall macro fun %lowcase   **;
              ** numfiles   Number of output files to generate. By default **;
              **            all files are generated numfiles=max. However  **;
              **            can specify 1 thru n to limit number of files. **;
              ** numrecs    Number of records in output file. This is      **;
              **            50000 as default per Dan Smith. Change to      **;
              **            smaller number to create smaller files for test**;
              **            Default=49,000 per Dave Overstreet 5/6/98      **;
              ** log        Extent of log kept in a permanent SAS dataset  **;
              **            F=info on files created and sent               **;
              **            C=info on claims created and sent              **;
              **            B=info on both files and claims (default)      **;
              ** fileloc    Location of output flat files. Default is      **;
              **            /nsf/data/. Note that the name of files is     **;
              **            nsf&start&end_n, where &start is starting qtr  **;
              **            &end is ending qtr depending on qtrs and date  **;
              **            values specified in nsf.sas program. If only   **;
              **            1 qtr is specified then only start date is     **;
              **            used in file name. Keeps track of data in file **;
              ** filestrt   The starting file number for this job. Per Ed  **;
              **            Black each submission should have distinct     **;
              **            file numbers from previous submission. For test**;
              **            this is not an issue. Specify max to get prev  **;
              **            file number (default).Specify 0 to start from 1**;
              **            Specify Delete to delete all records and start **;
              **            from 1, old copy is saved as flogn where       **;
              **            n=1 thru 9999.                                 **;
              ***************************************************************;
              %macro nsf_out(testprod=TEST,
                             report=Y,
                             ftp=N,
                             numfiles=max,
                             numrecs=49000,
                             log=B,
                             fileloc=/nsf/data/,
                             filestrt=max
                           );
                ** make this the default **;
                %if %upcase(&numrecs)=MAX %then %let numrecs=50000;
              
                options missing='';
              
                ************************************************************;
                ** assign start and end date for quarters being processed **;
                ** based on date and qtrs macro var specified in nsf.sas  **;
                ** This is used in constructing output file name. Also    **;
                ** create macro var to represent date for dataset name    **;
                ** Note that cuz of name length limit diff vars are needed**;
                ************************************************************;
                data _null_;
                 call symput('start',compress('q' || qtr("&date"d) || year("&date"d)));
                 %if &qtrs > 1 %then
                 %do;
                   call symput('end',compress('_' || 'q' ||
                               qtr(intnx('qtr',"&date"d,&qtrs-1)) ||
                               year(intnx('qtr',"&date"d,&qtrs-1))));
                   call symput('dsn',compress('q' || qtr("&date"d) || put("&date"d,year2.) ||
                               'q' || qtr(intnx('qtr',"&date"d,&qtrs-1)) ||
                                put(intnx('qtr',"&date"d,&qtrs-1),year2.)));
                 %end;
                 %else
                 %do;
                   call symput('end','');
                   call symput('dsn',compress('q' || qtr("&date"d) || put("&date"d,year2.)));
                 %end;
                run;
              
                %if &exist le 1 %then
                %do;
                  *******************************************************;
                  ** Intermediate data step to perform calculations on **;
                  ** variables. Convert to V99 format, convert char    **;
                  ** labchrg to numeric etc., reassign lengths etc.    **;
                  ** Ideally this should be done before the data gets  **;
                  ** here. Take not of the vars prefixed by new_ cuz   **;
                  ** they are done to accomodate difference in NSF1500 **;
                  ** requirement versus what is in orignial datasets   **;
                  *******************************************************;
                  data enc(drop=labchrg mdoth old_sex
                              fn1-fn5 tempname newfirst newmi i j initials
                           rename=(lchrg=labchrg));
                    set temp.enc(where=(prov ^= '')
                                rename=(memdodth=mdoth  memsex=old_sex)
                                drop=dob name
                               );
                    length services 4 memdodth new_dos $ 8 memsex diag_pt1 $ 1
                            newfirst $ 12 newmi $1;
                    array fn(5) $ 15 fn1-fn5;
              
                    ** zero out -ve charge per Dan Smith **;
                    if callowed < 0 then callowed = 0;
              
                    ** multiply by 100 cuz of COBOL assumed decimal spec **;
                    callowed=callowed*100;
                    ** max line charges allowed is 9999999 cuz 7 byte field **;
                    ** with assumed decimals                                **;
                    if callowed > 9999999 then callowed=9999999;
              
                    ** convert labchrg to number cuz it was stored as char **;
                    lchrg=put(labchrg,8.)*100;
              
                    ** convert numbers missing to 0 cuz options missing blank later on **;
                    if lchrg=. then lchrg=0;
                    if services=. then services=0;
                    if callowed=. then callowed=0;
              
                    ** put dates in appropriate formats **;
                    memdodth=compress(put(mdoth,yymmdd10.),'-');
              
                    ** for dos dont use old_xxx instead create a new variable **;
                    ** to output to file. This is cuz dos needs to be retained**;
                    ** as a number for later merging with original datasets in**;
                    ** the log steps of this macro                            **;
                    new_dos=compress(put(dos,yymmdd10.),'-');
              
                    ** save 1 byte in sex variable **;
                    memsex=compress(old_sex);
              
              
                    ************************************************************;
                    ** The value of diag_pt1 is supposed to indicate          **;
                    ** which is the most important diagnosis, for instance if **;
                    ** it is 1 then diag1 is most important, if diag_pt1=2    **;
                    ** then diag2 is most important.  However in encounter    **;
                    ** data diag1 is always most important and since diag_pt2 **;
                    ** thru diag_pt4 are optional, it makes it simple         **;
                    ************************************************************;
                    if diag1 ^= '' then diag_pt1 = '1';
              
                    **** assign new member and provider codes based on ****;
                    **** formats per Brit 12/17/97. Keep as diff vars  ****;
                    **** cuz need to merge back with old provider &    ****;
                    **** member codes, like for dos. Dont need fmt     ****;
                    **** anymore per ken 5/1/98 since encounter data   ****;
                    **** sets already have new crosswalked values      ****;
              
                    * renprov=put(renprov,$provxkk.);
                    * refprov=put(refprov,$provxkk.);
              
                    *****************************************************;
                    ** Fix problems with first name fields. extract    **;
                    ** middle initial whenever possible                **;
                    **                                                 **;
                    ** Only if patient middle initial is blank.        **;
                    ** Do not remove *, quotes, ( comma period etc     **;
                    **   may be helpful for interpreting undecipherable**;
                    **   strings in name field                         **;
                    ** Do not remove sr. jr. sr jr I II II IV etc      **;
                    ** Only if initials come after name                **;
                    ** Not for dual first names like mary anne         **;
                    ** Name defined as 2 consecutive alphabets         **;
                    ** Do not blank out appearance of MI in FN cuz     **;
                    **    there could be people with mi and fn starting**;
                    **    with same alphabet.                          **;
                    ** If middle initial appears in first name then    **;
                    **     blank it out.                               **;
                    ** Examples:                                       **;
                    **  E. Lisa = E Lisa                               **;
                    **  Annika K. = Annika, MI=K                       **;
                    **  William B.G = William B G, if non blank        **;
                    **          mi then mi=B                           **;
                    *****************************************************;
                    ** do this only if patient middle initial is missing**;
                    ** in some cases its says NEWBORN B etc for newborn **;
                    ** boy. so need to avoid them as well.              **;
                    if memmi='' and compress(memfn) ^= '' and index(upcase(memfn),'NEWBORN')=0 then
                    do;
                      ** left align name first **;
                      tempname=left(memfn);
                      ** replace ., etc with blanks **;
                      tempname=left(translate(tempname,' ','.*""'));
                      ** need separate stmt for single and double quote **;
                      ** to avoid problems with unbalanced quotes       **;
                      * tempname=left(translate(tempname,' ',"''"));
                      ** break up names into different parts **;
                      i=1;
                      do while (index(trim(tempname),' ') > 0 and i <= 5);
                        fn(i)=substr(tempname,1,index(tempname,' ')-1);
                        tempname=left(substr(tempname,index(tempname,' ')+1));
                        i+1;
                      end;
                      ** this assignment needed cuz loop exits without **;
                      ** assigning the last string in first name and   **;
                      ** the value of i at the end of loop is always   **;
                      ** the next value in the array                   **;
                      fn(i)=tempname;
                      ** initials checks to see if there are names like**;
                      ** A J where perhaps both initials are to be     **;
                      ** treated as first name                         **;
                      initials=1;
                      ** remove extraneous chars. Use i instead of 5 **;
                      ** cuz i would represent number of non blank   **;
                      ** vars and so makes it a bit more efficient   **;
                      do j = 1 to i;
                        ** if there is a name in paren, then delete that part **;
                        if index(fn(j),'(') > 0 then fn(j)='';
                        ** if all initials then do not use as mi **;
                        if length(compress(fn(j))) > 1 then initials=0;
                      end;
                      ** always take first single byte char and make it mi **;
                      do j=1 to i;
                        if length(compress(fn(j)))=1 and newmi='' and initials=0 then
                        do;
                           newmi=fn(j);
                           fn(j)='';
                        end;
                        newfirst=left(trim(left(newfirst)) || ' ' || trim(left(fn(j))));
                      end;
                      ** reassign first name only if middle initial was extracted **;
                      if newmi ^= '' then
                      do;
                          memmi=newmi;
                          memfn=newfirst;
                      end;
                    end;
                  run;
              
                  **** sort to appear after data step since mem & prov ****;
                  **** are reassigned by way of formats (crosswalk)    ****;
                  proc sort data=enc;
                    by prov mem dos proc;
                  run;
                %end; **end for exist=0  **;
              
                %if &exist le 2 %then
                %do;
                  *** run the macro that assigns constants needed for ***;
                  *** all record types                                ***;
                  %nsf_000;
              
                  ** get starting file number from testlog, by looking at files  **;
                  ** sent so far to host machine.                                **;
                  %if %upcase(&filestrt)=MAX  %then
                  %do;
                    proc sql;
                     reset noprint;
                     select max(filrecno) into :filestrt
                     from sasdb.&testprod.log;
                    quit;
                  %end;
                  %if %upcase(&filestrt)=DELETE %then
                  %do;
                     ** create name for copy of old log file **;
                     data _null_;
                       length dsn $ 100 n exist 4;
                       exist=1;
                       n=0;
                       do until(exist=0);
                         n=n+1;
                         dsn=compress('sasdb.flog' || n);
                         exist=exist(dsn);
                       end;
                       call symput('oldlog',dsn);
                     run;
              
                     data &oldlog;
                      set sasdb.&testprod.log;
                     run;
              
                     data sasdb.&testprod.log;
                       length filrecno 4;
                       if _n_=1 then stop;
                     run;
              
                     %let filestrt=0;
                  %end;
                  ** as a measure of caution if blank parm specified **;
                  %if &filestrt= %then %let filestrt=0;
              
                  ************************************************************;
                  ** Put stmts for individual records are in separate macros**;
                  ** Note that output stmt appears here not in those macros **;
                  ** cuz it is easier to keep tabs on records being output  **;
                  ** Naming convention for counter vars XXXYYYZZ            **;
                  ** XXX - represents level for count clm=claim,            **;
                  **       bat=batch, fil=file, rec=record,                 **;
                  **       det=claim detail level                           **;
                  ** YYY - represents item that is being counted. It        **;
                  **       could be clm=claim etc..as indicated above       **;
                  **       plus aa0=for aa0 records, ba0=for BA0            **;
                  **       records etc.                                     **;
                  ** ZZ  - Indicates whether variable is used as a          **;
                  **       serial number or as a representaion of           **;
                  **       total counts ct=counts, no=number                **;
                  ** Dataset is saved in the format qabbqxyy. It could come **;
                  ** in handy for debugging, investigation etc. Perhaps may **;
                  ** not be necessary to keep eventually                    **;
                  ** a - starting quarter number                            **;
                  ** bb - 2 digit start year (cant use 4 cuz of name limit  **;
                  ** x - ending quarter number                              **;
                  ** yy - 2 digit end year                                  **;
                  ************************************************************;
                  data sasdb.&dsn(compress=yes);
                    set enc end=eof;
                    by  prov mem dos proc;
                    length newfile eof 3
                           filrecno batrecno clmrecno clmrecct clmdetct 4
                           file_nam $ 100 file $ 25
                           default=8;
                    retain file_nam newfile file pat
                           filrecno fildetct filrecct filclmct filclmcg
                           batrecno batdetct batrecct batclmct batclmcg
                           clmrecno clmcxxct clmdxxct clmexxct clmfxxct clmclmcg;
                    label dsnrecct = 'Count of all records in file';
              
                    **********************************************************;
                    **                                                      **;
                    ** Foll is a list of important counter variables.       **;
                    **                                                      **;
                    ** clmrecno Is the number assigned to a claim for a     **;
                    **          member. A provider can have many members    **;
                    **          and each member can have many claims.       **;
                    **          Used to construct patient control number.   **;
                    ** clmrecct Is the number of records in a claim. Not    **;
                    **          same as clmdetct cuz this counts all records**;
                    **          that are output not just detail records.    **;
                    ** clmdetct Is the number of claim detail records in a  **;
                    **          claim. This is not same as batdetct cuz this**;
                    **          is reset to 1 whenever a new claim is       **;
                    **          encountered, and there can be multiple      **;
                    **          claims in a batch.                          **;
                    ** batrecct Keeps track of number of records, its not   **;
                    **          the same as _n_ cuz its reset to 1 when a   **;
                    **          new batch is encountered.                   **;
                    ** batrecno Its the number assigned to a batch. There   **;
                    **          can be a max of 9999 batches in a file.     **;
                    **          Needed, its output to file.                 **;
                    ** filrecno This keeps track of the file number being   **;
                    **          output since there could be multiple files  **;
                    **          cuz of maximum record limitations. Needed   **;
                    **          since its used to construct file name and   **;
                    **          its also output                             **;
                    ** filrecct Number of records in file - this is not the **;
                    **          same as actual records cuz AA0,BA0,ZA0 recs **;
                    **          are not included in this count, per spec.   **;
                    ** dsnrecct This counts every record in the file        **;
                    **          including all headers etc. ie everytime a   **;
                    **          a record is output. Not the same as _N_ cuz **;
                    **          it is set to 1 for a new file. Used to keep **;
                    **          track of max record limit. Not output to    **;
                    **          flat file.                                  **;
                    **********************************************************;
                    **********************************************************;
                    ** note that this do group is separated from aa0 header **;
                    ** do group because file statement needs to be executed **;
                    ** for every record and the seq of do groups is imp.    **;
                    ** start of new file                                    **;
                    **********************************************************;
                    if _n_=1 or newfile=1 then
                    do;
                      if _n_=1 then filrecno=&filestrt+1;
                        else filrecno+1;
                      %if %upcase(&numfiles) ^= MAX %then
                        %str(if filrecno > &filestrt+&numfiles then stop;);
                      ** file variable is saved to dataset, file_nam is not since **;
                      ** it appears in filevar parm of file stmt                  **;
                      file=compress('nsf' || put(filrecno,6.) || '_' || "&start" || "&end");
                      file_nam=compress("&fileloc" || file);
                    end;
                    ************************************************************;
                    ** this stmt needs to appear after file_nam is assigned   **;
                    ** since its used in the construction of output file      **;
                    ** name and appear outside do block. Filename flat is a   **;
                    ** dummy cuz filename assigned by filevar parm            **;
                    ************************************************************;
                    file flat filevar=file_nam lrecl=320;
                    **** Start of new file ****;
                    if _n_=1 or newfile=1 then
                    do;
                      **** Initialize file level counter vars ****;
                      newfile=0;
                      batrecno=0;
                      filclmct=0;
                      filrecct=0;
                      fildetct=0;
                      filclmcg=0;
                      dsnrecct=1;
                      %nsf_aa0;
                      output;
                    end;
                      **********************************************************;
                      ** Start of new batch. Note that batch number changes   **;
                      ** if a new provider. Also note that just first.prov    **;
                      ** is not enuf cuz for the same provider you would have **;
                      ** to start new batch, when you come to 50000 rec limit **;
                      **********************************************************;
                      if first.prov or batrecno=0 then
                      do;
                        batrecno+1;
              
                        **** Compute batch level counter vars ****;
                        dsnrecct+1;
                        **** Ed Black advised to include BA0 records ****;
                        **** in count on 12/15/97                    ****;
                        filrecct+1;
              
                        **** Inititlize batch level counter vars ****;
                        clmrecno=0;
                        ******************************************************;
                        ** reset batdetct= 0 cuz now you restart counting    **;
                        ** process to keep track of 50 claim records limit  **;
                        ******************************************************;
                        batdetct=0;
                        batrecct=0;
                        batclmct=0;
                        batclmcg=0;
              
                        **** Batch header record ****;
                        %nsf_ba0;
                        output;
                      end;
                        ********************************************************;
                        ** start of new claim. New claim changes when member  **;
                        ** changes or dos changes since a member can have     **;
                        ** many claims. All encounters on a given day are     **;
                        ** considered belonging to one claim. Also new claim  **;
                        ** if more than 99 claim detail records in one claim  **;
                        ** as per NSF spec field FA0-02                       **;
                        ********************************************************;
                        if first.dos or clmdetct=99 then
                        do;
                          **** Compute claim level counter vars ****;
                          clmrecno+1;
              
                          **** Intialize claim level counter variables ****;
                          **** patient control is const. for all claim records ****;
                          pat=compress('NC' || put(filrecno,z4.) || put(batrecno,z4.) || put(clmrecno,z5.));
                          clmrecct=0;
                          clmdetct=0;
                          clmclmcg=0;
                          ******************************************************;
                          ** these count vars are set to 0 and later increment**;
                          ** in the same do group. A little redundancy but did**;
                          ** that to accommodate multiple C,D,E records if    **;
                          ** that ever happens?                               **;
                          ******************************************************;
                          clmcxxct=0;
                          clmdxxct=0;
                          clmexxct=0;
                          clmfxxct=0;
              
                          **** claim header records ****;
                          clmcxxct+1;
                          clmrecct+1;
                          batclmct+1;
                          batrecct+1;
                          filclmct+1;
                          filrecct+1;
                          dsnrecct+1;
                          %nsf_ca0;
                          output;
              
                          **** Insurance information records ****;
                          clmdxxct+1;
                          clmrecct+1;
                          batrecct+1;
                          filrecct+1;
                          dsnrecct+1;
                          %nsf_da0;
                          output;
              
                          **** claim record ****;
                          clmexxct=1;
                          clmrecct+1;
                          batrecct+1;
                          filrecct+1;
                          dsnrecct+1;
                          %nsf_ea0;
                          output;
              
                        end;
              
                        ***************************************************;
                        ** Start of claim detail records within a claim. **;
                        ** Each record in ENC dataset represents a claim **;
                        ** detail, so all records are output. So far the **;
                        ** records output to file are extra records as   **;
                        ** per nsf specs. Note that output stmt appears  **;
                        ** in each record layout macro                   **;
                        ***************************************************;
                        **** compute claim detail level counters ****;
                        clmrecct+1;
                        clmdetct+1;
                        clmfxxct+1;
                        batrecct+1;
                        batdetct+1;
                        filrecct+1;
                        fildetct+1;
                        dsnrecct+1;
                        ***************************************************************;
                        ** added logic 5/14/98 to limit max amount on XA0 to 9999999 **;
                        ** seqeunce of the foll stmts is important cuz if sum hits   **;
                        ** 9999999 then detail charges is set to 0.                  **;
                        ***************************************************************;
                        if clmclmcg=9999999 then callowed=0;
                        clmclmcg=sum(clmclmcg,callowed);
                        ***************************************************************;
                        ** when detail charges caused total to exceed 9999999, assign**;
                        ** maximum chargeable amount to claim detail record and      **;
                        ** thereafter, encounter detail charges will be set to zero  **;
                        ***************************************************************;
                        if clmclmcg > 9999999 then
                        do;
                           callowed=callowed-(clmclmcg-9999999);
                           clmclmcg=9999999;
                        end;
                        batclmcg=sum(batclmcg,callowed);
                        filclmcg=sum(filclmcg,callowed);
                        **** claim detail record ****;
                        %nsf_fa0;
                        output;
              
                        **** end of claim ****;
                        if last.dos or clmdetct=99 then
                        do;
                           **** claim trailer record ****;
                           filrecct+1;
                           batrecct+1;
                           dsnrecct+1;
                           %nsf_xa0;
                           output;
                        end;
                      **** end of batch ****;
                      **** just last.prov is not enuf cuz the batch may still  ****;
                      **** be incomplete for same provider as seen when prov   ****;
                      **** code is missing. Note that you need the last.dos    ****;
                      **** in the if statement here cuz you need to finish up  ****;
                      **** claim even if it hits the max rec limit mark (50000) ****;
                      if last.prov or (dsnrecct >= &numrecs and last.dos) then
                      do;
                         **** batch trailer record ****;
                         batrecct+1;
                         filrecct+1;
                         dsnrecct+1;
                         %nsf_ya0;
                         output;
                      end;
                    **** end of output file ****;
                    **** end file if no more records in input file or batch    ****;
                    **** number limit reached or max record limit reached      ****;
                    if eof or batrecno=9999 or (dsnrecct >= &numrecs and last.dos) then
                    do;
                       **** file trailer record ****;
                       newfile=1;
                       dsnrecct+1;
                       %nsf_za0;
                       output;
                    end;
                  run;
                %end;
                %else %if &exist = 3 %then
                %do;
                   ** macro to assign constants **;
                   %nsf_000;
              
                   ** create output file **;
                   data _null_;
                     set sasdb.&dsn;
                     %if %upcase(&numfiles) ^= MAX %then
                        %str(if filrecno > &numfiles then stop;);
                     file_nam=compress("&fileloc" || file);
                     file flat filevar=file_nam lrecl=320;
                     select(rec_id);
                       when('AA0')
                       do;
                          %nsf_aa0;
                       end;
                       when('BA0')
                       do;
                         %nsf_ba0;
                       end;
                       when('CA0')
                       do;
                         %nsf_ca0;
                       end;
                       when('DA0')
                       do;
                         %nsf_da0;
                       end;
                       when('EA0')
                       do;
                         %nsf_ea0;
                       end;
                       when('FA0')
                       do;
                         %nsf_fa0;
                       end;
                       when('XA0')
                       do;
                         %nsf_xa0;
                       end;
                       when('YA0')
                       do;
                         %nsf_ya0;
                       end;
                       when('ZA0')
                       do;
                         %nsf_za0;
                       end;
                       otherwise;
                     end;
                   run;
                %end;
              
                ** This format mainly for ease of assignment of status codes. **;
                ** Formatted value is stored in dataset for easy to view      **;
                ** without the need to have this format loaded                **;
                proc format;
                 value $status
                 '00001' = '# of records exceeds specs                  '
                 '00002' = '# of batches exceeds specs'
                 '00003' = 'FA0 record count mismatch'
                 '00004' = 'CA0 record count mismatch'
                 '00005' = 'Total record count mismatch'
                 '00006' = 'Mismatch in file structure'
                 '00007' = 'FTP transfer MAY have failed'
                 '00008' = 'Bytes sent info not available'
                 '00009' = 'FTP transfer did not complete'
                 '00010' = 'Could not connect to host'
                 '00011' = 'File does not exist'
                 '00012' = 'Shell script failure'
                 '00013' = 'Mismatch in bytes transfered'
                 '00014' = 'Update successful'
                 'SUN'   = 'Sun ok, Update pending'
                 'STRATUS' = 'Stratus ok, Update pending'
                 '00015' = 'Ftp ok, Update failed in update creation'
                 '00016' = 'Ftp ok, Update failed in backup creation'
                 '00017' = 'Ftp ok, Update failed in backup existence'
                 '00018' = 'Ftp ok, Update failed in sort step'
                 '00019' = 'Ftp ok, Update failed in merge step'
                 '00020' = 'Ftp ok, Update failed, ** RECOVERY FAILED **'
                 '00021' = "&ftp ok, Update ok"
                 '00022' = 'Update mismatch'
                 '00023' = 'Update OK'
                  ;
                run;
              
                ** need to run the following steps for log files, or to create reports **;
                %if %upcase(&report)=Y or %upcase(&log)=F or %upcase(&log)=B or
                   %upcase(&log)=C %then
                %do;
                  **** Summary Reports to verify accuracy of files   ****;
                  proc sql;
                    create table rpt1 as
                    select file, filrecno, batrecno, fildetct, filrecct, filclmct, filclmcg, dsnrecct
                    from sasdb.&dsn(where=(rec_id='ZA0'))
                    order by filrecno;
                    ** Ending file number. Along with filstrt macro var, used for  **;
                    ** naming log and list files, and identify job being run       **;
                    reset noprint;
                    select max(filrecno) into :fileend
                     from rpt1;
                  quit;
              
                  **** Number of records and record types per file ****;
                  proc freq data=sasdb.&dsn;
                   tables filrecno*rec_id/list missing out=rpt2 noprint;
                  run;
              
                  proc transpose data=rpt2 out=rpt3;
                    id rec_id;
                    var count;
                    by filrecno;
                  run;
              
                  data rpt4(drop=cmd rec _name_ _label_ dt tm);
                    merge rpt3
                          rpt1;
                    by filrecno;
                    format file_dt mmddyy8. file_tm time.;
                    length status $ 50  cmd rec $ 100  sasdsn $ 8  lfile $ 200 testprod $ 4
                           fil_size ftp_size 8  ftp_date ftp_time 4 dt $ 10 tm $ 5;
                    testprod=symget('testprod');
                    ** name of sas dataset used to create file **;
                    sasdsn=symget('dsn');
                    ** need to deal with datetime more elegantly **;
                    file_dt=today();
                    file_tm=time();
                    status='OK';
                    ** for more sensible way of looking at dollars **;
                    filclmcg=filclmcg/100;
                    ** 100 recs tolerance on max file size. self defined **;
                    ** spec cuz number of records may exceed limit by    **;
                    ** small number to allow graceful closure of batch   **;
                    if dsnrecct > (100+&numrecs) then status='00001';
                    if batrecno > 9999 then status='00002';
                    if fildetct ^= fa0 then status='00003';
                    if filclmct ^= ca0 then status='00004';
                    if filrecct ^= sum(ba0,ca0,da0,ea0,fa0,xa0,ya0) then status='00005';
                    if aa0 ^= za0 or ba0 ^= ya0 or ca0 ^= xa0 then status='00006';
                    %if %upcase(&ftp)=SUN or %upcase(&ftp)=STRATUS %then
                    %do;
                      if status ='OK' then
                      do;
                        cmd = 'sh /nsf/scripts/' || lowcase("&testprod") || '/' || lowcase("&ftp") || ' ' || file;
                        call system(cmd);
                        %if &sysrc=0 %then
                        %do;
                           ** get current date, time echo to console **;
                           dt=put(today(),mmddyy8.);
                           tm=put(time(),time5.);
                           cmd='echo "File # ' || compress(put(filrecno,8.)) || ' Ftp OK  ' || dt || tm || '"';
                           call system(cmd);
                           lfile=compress("&fileloc" || 'ftplog/' || file || '.log');
                           infile in1 filevar=lfile missover pad end=last;
                           input @1 rec $char100.;
                           if index(rec,file) > 0 then
                           do;
                             fil_size=input(scan(rec,5,' '),8.);
                             input @1 rec $char100.;
                             if index(upcase(rec),'CONNECTED TO') > 0 then
                             do;
                               do until(last or index(upcase(rec),'TRANSFER COMPLETE') > 0);
                                 input @1 rec $char100.;
                               end;
                               if index(upcase(rec),'TRANSFER COMPLETE') > 0 then
                               do;
                                 do until(last or index(upcase(rec),'BYTES SENT IN') > 0);
                                   input @1 rec $char100.;
                                 end;
                                 if index(upcase(rec),'BYTES SENT IN') > 0 then
                                 do;
                                    ftp_size=input(scan(rec,1,' '),8.);
                                    ftp_et=input(scan(rec,5,' '),8.3);
                                    do until(last or index(upcase(rec),'GOODBYE') > 0);
                                      input @1 rec $char100.;
                                    end;
                                    if index(upcase(rec),'GOODBYE') > 0 then
                                    do;
                                       ftp_date=today();
                                       ftp_time=time();
                                       if ftp_size >= fil_size then status=upcase("&ftp");
                                       else status='00013';
                                    end;
                                    else status='00007';
                                 end;
                                 else status='00008';
                               end;
                               else status='00009';
                             end;
                             else status='00010';
                           end;
                           else status='00011';
                        %end;
                        %else
                        %do;
                          ** get current date, time echo to console **;
                          dt=put(today(),mmddyy8.);
                          tm=put(time(),time5.);
                          cmd='echo "File # ' || compress(put(filrecno,8.)) || ' Ftp Failed  ' || dt || tm || '"';
                          call system(cmd);
                          %str(status='00012';);
                        %end;
                      end;
                    %end;
                    status=put(status,$status.);
                    label filrecno = 'File #'
                          file     = 'File name'
                          sasdsn   = 'SAS*Dataset'
                          file_dt  = 'Creation*Date'
                          file_tm  = 'Creation*Time'
                          ftp_date = 'FTP*Date'
                          ftp_time = 'FTP*Time'
                          ftp_size = 'FTP*Size'
                          ftp_et   = 'Time to*FTP'
                          fil_size = 'File*Size'
                          status   = 'Status'
                          aa0      = '# of AA0 records'
                          ba0      = '# of BA0 records'
                          ca0      = '# of CA0 records'
                          da0      = '# of DA0 records'
                          ea0      = '# of EA0 records'
                          fa0      = '# of FA0 records'
                          xa0      = '# of XA0 records'
                          ya0      = '# of YA0 records'
                          za0      = '# of ZA0 records'
                        testprod   = 'AA0-21 TEST/PROD IND'
                          ;
                  run;
              
              
                  proc sort data=rpt4;
                    by filrecno;
                  run;
              
                %end;
              
              
                ** File level log **;
                %if %upcase(&log)=F or %upcase(&log)=B %then
                %do;
                   data sasdb.&testprod.log(compress=yes);
                     merge sasdb.&testprod.log
                           rpt4;
                     by filrecno;
                     ** in case prior jobs bombed half way and created **;
                     ** a spurious record in dataset.                  **;
                     if filrecno=. then delete;
                   run;
              
                   ** clean up **;
                   proc datasets library=work nolist;
                     delete rpt1 rpt2 rpt3 rpt4;
                   quit;
              
                %end;
              
              
                ** claim level log, updated only if file sent to stratus **;
                %if %upcase(&log)=C or %upcase(&log)=B %then
                %do;
                   ** run the foll 2 steps only when date changes. this **;
                   ** dataset remains same for all ipas for any qtr so  **;
                   ** run only once. If multiple qtrs this has to be run**;
                   ** several times. Hence call macro update from inside**;
                   ** this macro                                        **;
                   %macro quarter(strt_dt,end_dt);
                     proc datasets lib=work nolist;
                       delete updtlog;
                     quit;
              
                     proc sql;
                       create table quarter as
                       select prov, mem, dos, proc, a.filrecno, a.batrecno, a.source
                         from sasdb.&dsn(where=(rec_id='FA0')) a,
                              sasdb.&testprod.log b
                        where a.filrecno=b.filrecno
                          and upcase(status) =upcase("&ftp ok, Update pending")
                          and  "&strt_dt"d <= dos <= "&end_dt"d
                         order by prov, mem, dos, proc;
                     quit;
              
                     proc sql;
                       reset noprint;
                       create table srclist as
                        select distinct(source) from quarter;
                     quit;
              
                     **** File to call update macro ****;
                     filename updt 'updt.sas';
              
                     data _null_;
                       length libname memname $ 8 cmd $ 100
                              qtrs strt_dt end_dt 8;
                       set srclist(where=(source ^= ''));
                       by source;
                       file updt;
                       retain qtrs;
                       if first.source then qtrs=0;
                       libname=put(source,$ipalib.);
                       ** for each source or ipa, create member dataset names   **;
                       ** based on number of quarters and year. Note that these **;
                       ** member names are same for all ipas and hence no       **;
                       ** reference to lib or source here                       **;
                       do while(qtrs < &qtrs);
                         strt_dt=intnx('qtr',intnx('qtr',"&date"d,qtrs),0);
                         end_dt=intnx('day',intnx('qtr',intnx('qtr',"&date"d,qtrs),1),-1);
                         memname=compress('p' || put(strt_dt,year2.) ||
                                             'q' || qtr(strt_dt));
                         ** put date informat so can be easily read for debugging **;
                         cmd='%update(' || libname || ',' || source || ',' || memname || ');';
                         put @1 cmd;
                         qtrs=qtrs+1;
                         ** needed only for debugging **;
                         output;
                       end;
                     run;
              
                     %include updt;
              
                     X 'rm updt.sas';
              
                   %mend quarter;
              
                   ** note that lib and source values are not always identical **;
                   ** hence need to pass them as separate parms. Note date is  **;
                   ** passed as formatted value for easier debugging           **;
                   %macro update(lib,source,mem);
                     %let status=;
                     %let toupdate=0;
                     %let updated=0;
              
                     ** take additional precautions per discussion with Ken **;
                     ** this avoids accidentally using older copy of work   **;
                     ** dataset, may never happen if job run in batch       **;
                     proc datasets lib=work nolist;
                       delete &source;
                     quit;
              
                     ****************************************************************;
                     ** reasonable to assume every claim has a claim header record **;
                     ** and save the sql  step some time. Also dont   use nodup    **;
                     ** since you may want to keep track if duplicates were sent   **;
                     ** Verified with Ken that encounters with a given dos will    **;
                     ** always appear in dataset with the appropriate quarter name **;
                     ** Thus merge back with individual quarter dataset is a good  **;
                     ** idea.                                                      **;
                     ****************************************************************;
                      ****** need to verify status code logic in select ****;
                     ** note that batrecno in sasdb.&testprod is total num of batches  **;
                     ** while batrecno in Pxxxx dataset is the actual batch number in  **;
                     ** file containing the encounter                                  **;
                     proc sql;
                       create table &source(drop=source) as
                       select *
                        from quarter
                        where upcase(source)="&source";
                       reset noprint;
                       select count(*) into :toupdate
                         from &source;
                     quit;
              
                     data _null_;
                       call symput('existrc',exist("work.&source"));
                     run;
              
              
                     *** update dataset exist so go ahead **;
                     %if &existrc gt 0 and &toupdate gt 0 %then
                     %do;
                       ** delete older copy..unlikely to happen in batch ***;
                       proc datasets lib=work nolist;
                        delete &mem;
                       quit;
              
                       proc copy in=&lib out=work;
                        select &mem;
                       run;
              
                       %if &sysrc eq 0 %then
                       %do;
                         data _null_;
                           call symput('existrc',exist("work.&mem"));
                         run;
              
                         ** only if back up copy exists, go forward **;
                         %if &existrc > 0 %then
                         %do;
              
                           proc sort data=&lib..&mem;
                             by prov mem dos proc;
                           run;
              
                           %if &sysrc eq 0 %then
                           %do;
                             ****************************************************************;
                             ** Now begin process master files. Per Ken Kwan it seems like **;
                             ** a safer approach is necessary to ensure integrity of master**;
                             ** files. Hence monitor return codes and receover dataset in  **;
                             ** addition to SASs normal recovery. Note logic to see which  **;
                             ** records have been updated. Var is called _updated and      **;
                             ** hopefully no var by that name in the dataset.              **;
                             ****************************************************************;
                             data &lib..&mem(drop=_updated);
                               merge &lib..&mem(in=in_mem)
                                     &source(in=in_src);
                               length _updated 4;
                               by prov mem dos proc;
                               retain _updated;
                               if in_mem and in_src then _updated+1;
                               call symput('updated',_updated);
                             run;
                             %if &sysrc eq 0 %then %let status=00021;
                             %else
                             %do;
                                %let status=00019;
                                ** need to improve logic here to store this info and **;
                                ** recover from this failure                         **;
                                proc copy in=work out=&lib;
                                  select &mem;
                                run;
                                %if &sysrc ne 0 %then %let status=00020;
                             %end;
                           %end;
                           %else %let status=00018;
                         %end;
                         %else %let status=00017;
                       %end;
                       %else %let status=00016;
              
                     %end;
                     %else %let status=00015;
              
                     data updt;
                       length memname libname $ 8 toupdate updated 4 status $ 50;
                       memname=symget('mem');
                       libname=symget('lib');
                       toupdate=symget('toupdate');
                       updated=symget('updated');
                       if toupdate ne updated then status='00022';
                       else status='00023';
                       status=put(status,$status.);
                       output;
                     run;
              
                     proc append base=updtlog data=updt;
                     run;
              
                     ** If there is a revised status then update testlog dataset **;
                     %if &status ne  %then
                     %do;
                       **********************************************************;
                       ** if update was successful then update testlog dataset **;
                       ** if not recover back up copy and set the staus as wasg**;
                       ** Leaving status msg as is, enables picking up records **;
                       ** during rerun. Note it is possible that some claims   **;
                       ** in a file could have been updated while others not,  **;
                       ** if there is a failure in update macros. This is cuz  **;
                       ** a file has records from multiple ipas and all        **;
                       ** for a given ipa cannot be assuredly found in any     **;
                       ** predeterminable file number                          **;
                       **********************************************************;
                       proc sql;
                         reset noprint;
                         create table log
                         as select distinct(filrecno)
                         from &source;
                       quit;
              
                       data log;
                         set log;
                         length status $ 50;
                         status="&status";
                         status=put(status,$status.);
                       run;
              
                       proc sort data=sasdb.&testprod.log;
                        by filrecno;
                       run;
              
                       ** update logs irrespective of whether update failed or worked **;
                       data sasdb.&testprod.log;
                          merge sasdb.&testprod.log
                                log;
                          by filrecno;
                       run;
                     %end;
              
                     ** clean up **;
                     proc datasets lib=work nolist;
                       delete &source &mem;
                     quit;
              
                   %mend update;
              
                   ** Generte code to call quarters macro. Quarters macro is called **;
                   ** once for each quarter specified in date range in nsf.sas. In  **;
                   ** turn quarters macro calls update macro once for each ipa for  **;
                   ** that quarter. Its in the update macro that analysis datasets  **;
                   ** get updated                                                   **;
                   filename qtrs  'qtrs.sas';
              
                   data _null_;
                     length qtrs strt_dt end_dt 8 cmd $ 100;
                     file qtrs;
                     qtrs=0;
                     ** reference to lib or source here                       **;
                     do while(qtrs < &qtrs);
                       strt_dt=intnx('qtr',intnx('qtr',"&date"d,qtrs),0);
                       end_dt=intnx('day',intnx('qtr',intnx('qtr',"&date"d,qtrs),1),-1);
                       ** put date informat so can be easily read for debugging **;
                       cmd='%quarter(' || put(strt_dt,date9.) || ',' || put(end_dt,date9.) || ');';
                       put @1 cmd;
                       qtrs=qtrs+1;
                       ** needed only for debugging **;
                       output;
                     end;
                   run;
              
                   %include qtrs;
              
                   X 'rm qtrs.sas';
                %end;
              
              
                ** print reports at the end to show updated status message **;
                %if %upcase(&report)=Y %then
                %do;
              
                  options ls=155 ps=55 pageno=1 ;
                  proc print data=sasdb.&testprod.log double split='*' noobs;
                    var filrecno file sasdsn file_dt file_tm ftp_date ftp_time fil_size
                        ftp_size dsnrecct batrecno filclmct fildetct status;
                    ** select files pertaining to this job only **;
                    where  &filestrt < filrecno <= &fileend;
                    format file_tm ftp_time time. file_dt ftp_date mmddyy8.;
                    title1 "Summary of records processed";
                     sum dsnrecct fil_size ftp_size batrecno filclmct fildetct;
                  run;
              
                  %if %upcase(&log)=C or %upcase(&log)=B %then
                  %do;
                    proc print data=updtlog split='*';
                      var libname memname toupdate updated status;
                      sum toupdate updated;
                      label libname='SAS Libname'
                            memname='SAS Dataset*Name'
                            toupdate='# of obs*to update'
                            updated='# of obs*updated'
                            ;
                      title1 'SAS datasets updated';
                    run;
                  %end;
              
                %end;
              
                %if &end= %then %let dsnstem=%left(&start);
                %else %let dsnstem=%left(&start)_%left(&end);
                %if &fileend= or &fileend=1 %then %let filestem=%left(%eval(&filestrt+1));
                %else %let filestem=%left(%eval(&filestrt+1))_%left(&fileend);
              
                %let lstfile=&fileloc.saslst/nsf&filestem._&dsnstem..lst;
                %let logfile=&fileloc.saslog/nsf&filestem._&dsnstem..log;
              
                filename scr '/nsf/scripts/finish';
              
                data _null_;
                  file scr;
                  length logfile lstfile $ 200;
                  logfile=symget('logfile');
                  lstfile=symget('lstfile');
                  put   @1 '#! /bin/sh'
                      / @1 'mv /nsf/prog/nsf.log ' logfile
                      / @1 'echo "    Log File was saved as ' logfile '"'
                      / @1 'mv /nsf/prog/nsf.lst ' lstfile
                      / @1 'echo "    List File was saved as ' lstfile '"'
                      / @1 'print_landscape ' lstfile
                      ;
                run;
              
                filename scr clear;
              %mend nsf_out;
             

SAS Procedures

"Procs", abbreviation for procedures, operate on sets of records (datasets). In most situations data is pre-processed in a data step before calling a Proc. Base SAS alone has 75 Procs while all other SAS products combined have about 210 Procs. Most data processing tasks can be accomplished by one or a combination of several Procs.

For example there is a "Proc Sort" to sort data, there is a "Proc Report" to report and print data, there is a "Proc Univariate" for descriptive statistics, "Proc Reg" for regression analysis, etc. With so many Procs available, it is usually a lot simpler to identify one or several Procs that can do the job than build program logic in a data step.

Writing efficient SAS programs requires planning the sequence in which Data Step and Proc based processing takes place. Having an understanding of the scope of various Procs is necessary to determine whether processing can take place via a Proc or has to be done in a Data Step. Procs are more efficient and requires fewer lines of code, whereas Data Step programming offers greater flexibility.

Macro Language

SAS Macro facility is employed to automate processes in SAS software. SAS macro language can conditionally and iteratively modify SAS language statements or even create new ones. It reduces the amount of program statements that have to be written to accomplish a task. SAS macro facility gives the ability to break a large SAS language process into smaller sub-processes and link them to achieve efficiencies.

Output Delivery System

As the name implies, Output Delivery System in the SAS System deals with the presentation of results. Typically results from SAS are generated by printing, graphing or analytical procedures. These procedures logically process data in an input SAS dataset based on user parameters specified in the SAS code and yield output objects in the form of tables and graphs. ODS makes it possible to select output object and customize their format suited to different destinations such as a browser, printer, spreadsheet, PDF, image files among others.

Handling output objects from a proc output is useful when building automatic analytical processes. For instance results from a process maybe directed to a formatted Excel spreadsheet. When a process is run repeatedly resultant data can be captured in the same output format without any manual formatting. ODS, in addition to making results visually appealing, is very useful for automating analytical processes.

SAS SQL

SQL within SAS software is SAS implementation of Structured Query Language via a procedure (Proc SQL). It is an enhancment of ANSI standard SQL and well suited to processing data stored in a relational format either in a SAS dataset or Relational Database software. SAS SQL can be used to query, subset, join, sort and group data without using Data steps or other Procs. It supports use of Data Step functions in Select statement bringing in the power of SAS language to SAS SQL language. In some cases it may be simpler to write code using SAS SQL. For instance it is easier to generate a cartesian product of two datasets using Proc SQL since that is the default output of a SQL join. However complex SQL code may not be as easy to main as data step code.

SQL Pass Thru facility is an extension of the SQL procedure that enables sending of DBMS specific SQL statements to an external database via SAS/Access products and retrieve results into a SAS dataset. By using DBMS specific SQL it can be more efficient to preprocess data in the DBMS and retrieve a resultset into SAS for further processing. It is therefore important to understand when SQL needs to be employed within SAS programs to derive maximum benefit.